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ABSTRACT 

Iterative error correction of asymptotically large associative memories is equivalent to a 
one-step learning rule. This rule is the inverse of the activation function of the memory. 
Spectral representations of nonlinear activation functions are used to obtain the inverse in 
closed form for Sparse Distributed Memory (Kanverva, 1988), Selected-Coordinate De- 
sign (Jaeckel, 1989), and Radial Basis Functions (Poggio, 1989). 
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1.0 Introduction 

In this report, I study issues governing learning in associative memories which use pat- 
terns to recall other patterns. The patterns can be simple or complex, composed of one or 
many features. The features can refer to sensory input or high-level mental constructs. The 
paper is theoretical and treats these patterns as abstract vectors in /i-dimensional spaces. 
There are many questions that can be asked about such systems, however, I focus on the 
following. 

How can precise patterns be recalled when information is distributed throughout an asso- 
ciative memory? In what way can the effects of interference between patterns in the mem- 
ory be undone ? When an associative memory is implemented as an artificial neural 
network what learning rule will produce high capacity and precise recall? How does the 
issue of generalization relate to the issue of storage capacity? 

These questions are examined for associative memories which have an unlimited number 
of memory locations. This extreme case reveals there exist very efficient single-step learn- 
ing rules for high quality recall. The extreme-case rules are precisely determined yet are 
not simple nor intuitively obvious. I examine these rules in an effort to understand of the 
basic processes governing the behavior of distributive systems. 

I have found this investigation to be insightful for it indicates the interplay between stor- 
age 1 and recall necessary to realize high quality memories. The picture that emerges is an 
onion of alternating layers of excitation and inhibition about each neuron of the memory 
(hence the logo on the title page). This picture is reminiscent of the on-center off-surround 
network found in the visual cortex of higher vertebrates (Malsburg, 1973). Such structures 
are necessary for producing sharp edges in distributive systems. Sharp edges are necessary 
for precise discrimination in small regions when there are high correlations between dis- 
parate input patterns. 

The asymptotic results derived raise the question of whether there exist analogous rules 
for nonasymptotic memories. This question is left to a later publication (Danforth, 1991). 

1.1 An example of total recall 

One might well ask whether it is even possible to realize high quality recall in a distribu- 
tive system. To illustrate that it is possible to realize perfect recall consider the following 
simple example. 

Assume an associative memory has input x where x is a pattern. For this example assume 
the pattern is binary valued. Also assume there are but two bits inx (jtj andx 2 ). Assume 
for each of the four possible inputs the output patterns y = A, B, C, and D are associated. It 
is not necessary to specify what these patterns are or what their dimensionality is, only that 


1. The inverse rules derived here can equally well be applied to the recall process. If recall uses the inverse 
then learning uses the rule and visa versa. 
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ity is, only that they are numeric. They do not have to be distinct. Assume there is a mem- 
ory location for every possible input pattern (the asymptotic case). The specified 
associations are depicted below. 

In a distributive memory, when the pattern y - A is associated with the pattern x = (0, 0), it 
is not placed just ax (0,0) but is written at several locations. This is partly motivated by the 
desire for fault tolerance under hardware failure. See Figure 2. 

The rule used for the distribution of pattern A in Figure 2 writes it in all locations within a 
Hamming distance of one of (0, 0). If this rule is applied to each of the patterns AJ3.C, and 
D to be written into memory, the following configuration results (Figure 3). 
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Figure 3 Multiple writes. 


The contents of a memory location are taken to be the sum, e.g. A + B + C, of the patterns 
written into it. Memory locations accumulate information by adding to their present store. 
How can the desired pattern be untangled when this memory is read? If the same rule is 
used for writing and reading, one would expect to read from all locations within one bit of, 
say, (0, 0). How is this done? The contents can be pooled (summed) to produce 

y = M(0,0) 

— (A + C + D) + (A + B + C) + (A + B + D) 

= 3A + 2B + 2C + 2D. 


It is not at all obvious that the pattern A which we wish to recover from location x = (0,0) 
will predominate in this sum. In fact, if all of the other patterns in memory are actually 
equal to Q (B = C = D = Q), for some Q, then y = 6Q + 3 A and the pattern Q predomi- 
nates over the pattern A (six versus three). 

So it appears that distribution followed by pooling is not a good way to store and recover 
patterns precisely. However, one might think the pattern Q should be what is read from lo- 
cation (0,0). The predominance of Q over A indicates A may be in “error”. To accept Q is 
to allow the memory to correct errors by smoothing local irregularities. But what if the da- 
tum A is not error but signal? Then the signal has been lost in the noise of Q. 

I will examine the case where all data are considered signal to be retained. For well-sepa- 
rated data the distribute-and-pool process meets the needs of robustness and generaliza- 
tion. If the data are not well separated then noise can swell to overcome the signal. 

Is there an alternative way to distribute information such that, on recall, the signal pre- 
dominates? The answer to this question is yes, and the way this distribution and recall pro- 
cess interact is illuminating. 
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Consider Figure 4 where the write rule has been changed. Locations within one bit of (0, 
0) now receive the pattern A/3 whereas the location two bits away in Hamming distance 
receives twice the negative of this value. Applying this new rule to each of the remaining 
patterns produces Figure 5. 


x 2 



Figure 4 A new write rule. 


*2 

1 

1 |( A + C+D-2B ) |(£ + C + D-2A) 


0 | ^ (A+B + C -2D) |(A + B+D-2C) 


0 


1 


Figure 5 Multiple writes using new rule. 
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If the memory is now read at (0, 0) with the original pooling rule (the contents of all loca- 
tions within a Hamming distance of one are added), then 

y - M(0,0) 

= ((A + C + D - 2B) + (A + B + C - 2D) + (A + B + D - 2C))/3 
= (A+A+A+B + B-2B + C + C-2C + D + D- 2D)I3 
= (3A +0 +0 +0)13 
= A 


and the desired signal is recovered without error. The patterns B, C, and D at (0, 1),(1, 0), 
and (1,1) are likewise recovered exactly. So by changing the write rule but retaining the 
read rule it is possible to compensate for the process of distributing information within the 
memory. How was the write rule determined? The answer to this question is the main top- 
ic of this report. 

It can be argued that shuffling information around in a memory which spans all possible 
input patterns is not very enlightening since it is precisely the vastness of the number of 
these patterns (2 100 ^, say) that makes it impossible to construct such a memory. Having 
said this, it is still beneficial to understand what form the shuffling and encoding must take 
in order to retrieve patterns from a dense memory (one with all possible locations present). 
It will be shown that knowledge of this dense case provides information about the ensem- 
ble average of sparse memories. It will also be shown that error correction in artificial 
neural nets is related to this process of weighting and shuffling of information. In addition, 
the analysis of the asymptotic case provides a concrete place to stand for further analysis 
of smaller implementable memories. 
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2.0 Overview 

This report develops a methodology for constructing single-step write-rules for asymptot- 
ically large single-hidden-layer artificial neural systems based on a spectral representation 
of their nonlinear activation functions. 

In section 3.0 notation is developed for reading and writing using nonlinear operators. 

In section 4.0 it is shown that repeated writing of errors into a memory implements a series 
expansion of the inverse of its activation function. 

In section 5.0 the general notation of the spectral representation for activation rales is in- 
troduced and the general form of the inverse derived from it. 

In section 6.0 the inverse is determined for Sparse Distributed Memory (Kanerva, 1988). 

In section 7.0 the Selected-Coordinate Design (Jaeckel, 1989) is discussed and its inverse 
derived. 

In section 8.0 Radial Basis Functions (Poggio, 1989) are introduced and an expression for 
their inverse is presented. 

The report ends with section 9.0 giving a summary and conclusions of this research. 
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3.0 Notation 

An associative memory is a system for constructing input-output mappings from examples 
of input-output pairs. The memory has a construct called a location with a set of input 
weights, x, and a set of output weights, y . The vector x corresponds to the address of the 
location in a Random Access Memory (RAM) and the vector y corresponds to the data 
stored at the location. A location is activated by an input pattern x if the address decoder or 
activation function. A, deems that the pattern is sufficiently similar to the address x of the 
location. If the location is activated, information y either can be stored at the location 
(weights y modified) or retrieved from the location (weights y contribute to the output 
from memory). The activation rule. A, is the kernel or fundamental aspect that characteriz- 
es an associative memory. This activation rule will be written symbolically to bring forth 
its functional dependence on input, x, address, x, and dimensionality of the input space, n. 
When specializing to specific forms for Sparse Distributed Memory and Selected-Coordi- 
nate Design, other parameters will be introduced. The activation rule is written as 

a = A x £ri). (EQ l) 

By writing the activation rule as a function of two arguments (the input vector and the in- 
put weights), a very broad class of functions can be represented. It includes the standard 
linear threshold unit where the inner product of x and x is compared to a threshold. It also 
includes functions that may depend in a complex manner upon the interplay between input 
and weights. The result of activation, the quantity a, can be binary valued or real valued 
(e.g., sigmoid functions). The unspecified complexity of A has the potential of modeling 
not just a node in a single hidden layer but also a node deep within a multilayer system 
when the dimensionality of x is allowed to be far larger than that of x. In this paper it is as- 
sumed A is the activation rule for a memory with a single hidden layer. 

Although the results of activation are treated linearly, it needs to be stressed that the acti- 
vation rule A can be a highly nonlinear function of its arguments. Many of the results de- 
rived in this paper are independent of the exact form of this nonlinearity. 


3.1 Sparse sampling 

The pairing of input pattern, x, with output pattern, y, leaves unspecified whether the pair 
is actually observed in any finite sample. A notation is adopted here that depicts y as a di- 
rect function of x, namely, y x (for discrete input spaces) or y(x) (for continuous input spac- 
es). This notation is taken to mean that at any input point x, y x is the observed output at 
that point. If x does not occur in the sample, y x is zero. If x occurs multiple times in the 
sample, y x is the sum of the y ’s at that point. In like manner y i is the sum of the data 
stored at address x. If there is no location in memory with address x, y i is the zero pattern. 
With this understanding, the patterns x and x are allowed to range over all their possible 
values with y x and y . constraining the associated values to those observed. 
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3.2 Projections 

The issue of sparse data sampling can be specified cleanly in terms of projection opera- 
tors. I define the operator n to be the projection of the space onto the subspace spanned 
by sample data y. The action of II on y leaves y unchanged. Since the operator is a projec- 
tion it is idempotent (repeated application of a projection is the same as a single applica- 
tion). So 


n y = y, (EQ 2) 

nn = n. 

For discrete spaces, one can think of the operator II as the identity matrix with holes along 
the diagonal. The nonzero diagonal elements specify the input patterns that occur in the 
data. 

In like manner, sparse memory locations can be represented as a projection, fl, of the full 
space of all possible memory locations onto those actually occurring in the memory, y . We 
then have 


n y = y, (EQ3) 

nri = fi. 

These projection operators are mentioned here for completeness of exposition. They will 
be sparingly used in the body of this report but will play a stronger role in future research 
directed at the issue of generalization in sparse memory systems. 

3.3 Recall 

The process of recall from a distributive memory is taken to be a weighted summing of in- 
formation from the locations of the memory conditionalized on the input pattern x. The 
weighting is determined by the activation rule A where 

*x = < e q 4 > 

x 

The result of this weighted pooling is a pattern, s x , called the sums. The sums may be fur- 
ther processed for other stages of analysis or for cascading memories. In this work, the 
sums are considered the output from the memory. 

If x is an observation point with y x then for perfect recall it is desired to have s x = y x . 

3.4 Storage 

The process of storage in a distributive memory is taken to be a weighted summing of in- 
foimation from observations conditionalized on the input pattern x'. The weighting is de- 
termined by an activation rule B (to be determined in terms of A) where 
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and 


y* 


^i.jr 




= i 


0 


x present, 
otherwise. 


(HQ 5) 


(EQ6) 


Each observation, y x , , is weighted by the B activation rule and written into location x if it 
is present. The quantity y^ accumulates these values. 

3.5 Operators 

The previous equations can be encapsulated concisely by using operator notation. By this 
I mean that for discrete spaces the quantities are considered vectors and matrices and for 
continuous spaces the quantities are functions with one or two arguments. Products are 
taken to be inner products (summed or integrated) over their appropriate spaces. 

The values stored in memory are written as 

y = ilBy (writing to memory) (EQ 7) 

and the values recalled from memory are written as 

s = Ay (reading from memory) . (EQ 8) 

The sums, s, can now be expressed directly in terms of the observations, y, as 

s = AtlBy. (EQ 9) 


3.6 Ensemble averages 

The operator fl plays a fundamental role in actual implementation of associative memo- 
ries for it specifies in address space (weight space) where hidden nodes are placed. If these 
locations are randomly chosen subject only to the constraint the expected number of them 
is fixed, then the expected behavior of the associative memory can be derived. Let p be the 
probability a specific address will exist in a uniformly randomly chosen memory. The ex- 
pected value of fl is then just a multiple of the identity matrix 

E(TI) = pi (EQ 10) 

for fixed positive scalar constant p. This means, over all models with randomly chosen 
hidden nodes, the expected sums for a fixed training set can be derived from the product of 
the activation rule and a memory with all possible locations present. Hence, the operator 
tl will not be considered in the remainder of this paper and the sums will be written as 
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s = ABy. 


(EQ II) 


The results that follow therefore hold for uniformly randomly chosen hidden nodes or 
dense memories where all possible hidden nodes are present.^ 


3.7 Identity transformations 

If one could find a B such that AB were the identity transformation, then the sums, s, 
would be exactly the data, y, and recovery of data would occur without error. Note that 
reading at points x which were not sampled would yield the zero pattern. For systems that 
generalize this is not desirable, whereas for systems with perfect recall, it is. 

I now turn to the motivation for considering inverses arising from the practice of using er- 
ror correction in artificial neural networks. 


2. Random placement of locations for the Selected-Coordinate Design has been shown (Danforth, 1990) to 
give good results for single-talker discrete-speech digit recognition. 
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4.0 Series expansion of inverse operators 

A frequently used learning rule in artificial neural network research is back-propagation 
(Rumelhart, 1986). It repeatedly cycles the differences (errors) between desired output and 
actual output back into the network to adjust internal weights between layers of the net- 
work. 3 The form of associative memories considered in this paper have a single hidden 
layer where the weights between the input and hidden layer are fixed. Only the weights be- 
tween hidden and output layer are adjustable. Therefore, knowledge of the derivative of 
the activation rule is not needed since propagation of errors through more than one layer is 
not used. 

What I now show is when errors are used to cyclically adjust the weights in a system with 
one hidden layer where all possible hidden nodes are present, the resultant state of the 
memory is identical to that produced by a single-step inverse operator. 

The following derivation assumes (in accord with current neural net literature) the activa- 
tion rule used for recall from memory is the same one used for storage in memory. 

Let y(t - 1) be the data stored at location x (with x unrestricted) at time t-1. If one writes 
into this location the difference between the desired signal y and the sums generated by a 
read operation with activation rule A, the value of the data at time t will be given by 

y{t) = y(t - 1) + A T e(t - 1) (EQ 12) 


where 


e(t- l) = y- Ay(t - 1). (EQ 13) 

The operator A is the write operator and is the transpose of the read operator A (the trans- 
pose distinction is made here for purity and is usually not stipulated since most activation 
rules are symmetric). The write operator weights and pools the errors from distant patterns 
to a specific hidden location for modification. The read operator weights and pools all hid- 
den locations to produce the output sums for comparison with the desired signal, y. Recall 
y is the whole data set. That is, y x is the output pattern associated with input pattern x. If x 
does not occur in the data set then y x is the zero pattern. Now (EQ 12) represents one ep- 
och of training (note the error signal is buffered until all values are calculated; then every 
location is updated at once). Regrouping of terms gives 

y(r) = A T y + (/ - A 7 A) y(t- 1) . (EQ 14) 

When carried to time t=0, this recursion yields the sum 

i - 1 T 

y(t) = £ U -A 7 A) A 7 y (EQ 15) 

T = 0 


3. A similar technique is used by Prager (1989) to adjust the weights of an associative memory. 
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where it has been assumed the memory is filled with the zero pattern at time zero. In the 
limit of an infinite number of epochs the above series expansion \s formally equivalent to 


y - lim y(t), 

I — ► 00 


(EQ 16 ) 


= X </-A r A)Yy, 

X = 0 


* [/- (I-A T A)] 



= [A T A] 



= A~ l y. 

That is, if the series converges then it converges to the inverse of the operator. This formal 
argument reveals the relationship between writing errors into an associative memory and 
the use of an inverse operator for the learning rule. They are equivalent. This equivalence 
holds only when all possible hidden locations are present (no restriction on domain of x) 4 
and an infinite number of error-correction cycles is used. It indicates there exist strategies 
(inverse rules) that lead to very rapid (a single epoch) learning of specific patterns. 


4. For sparse sampling and sparse memory there is an equivalent statement concerning die matrix derived 
from the activation rule evaluated on the lattice of observation points. 
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5.0 Spectral representation of nonlinear operators 


Given an activation rule, A, for an associative memory, one wishes to find a second rule, 
B , which will act as the inverse of A . This is expressed as 


y A.B.. = 5 r . (discrete spaces) 

Xf X X Xy X 

X 


(EQ 17) 


or 


jA(x, x)B(x, x)dx = 5(x - x') (continuous spaces). (EQ 18) 

The quantity 5 is the Kronecker delta function for discrete spaces and is the Dirac delta 
function for continuous spaces. The operator, B , acts as the write-rule into memory and 
the operator A acts as the read-rule from memory. The intermediate states, x, absorb infor- 
mation written to them in a distributed fashion and then, during reading, recombine that 
information. 

The determination of the operator which cancels the effects of pooling can be determined 
for both discrete and continuous spaces in terms of the associated eigenvalue problem for 
the operator A. The problem is written 

= (eigenvalue problem) (EQ 19) 

where Y is the eigenvector associated with A and A is its eigenvalue 5 . For binary input 
spaces of dimension n, the number of components of V F is equal to 2 n . For continuous in- 
put spaces of dimension n, the eigenvalue problem is a homogeneous integral equation of 
the second kind (Smithies, 1962) and the eigenvectors are eigenfunctions of n parameters. 

I use eigenvector notation here with the understanding inner products can readily be inter- 
preted as integrals for continuous spaces. Orthogonality is easily shown to hold for real 
symmetric operators A, (interchange of input pattern and address pattern leaves the value 
of activation unchanged). 

= 5^ (orthonormal eigenvectors). (EQ 20) 

The superscript T stands for the transpose of the vector, and the quantities a,b specifies 
which member in the set of eigenvectors is under consideration. 

If the eigenvectors form a complete set (basis), then an arbitrary vector (function) in the 
space can be written as a linear combination of them as 

y = Xc a V a (eigenvector expansion). (EQ21) 


5. Upperca se lamMa* are written for the full eigenvalues since they are usually the weighted cumulation of 
yivil eigenvalues. Shell eigenvalues are written as lowercase lambdas (see section 5. 2). 
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The identity operator is written as the sum over all outer products of the eigenvectors as 

1 = (EQ22) 

a 

That this is true can be proved using the orthonormality of the eigenvectors, where 

ly = <EQ23) 

= - y (for all y). 

a % b a 

It also can be shown the operator A is expressible in terms of *F and A as 

A M XVJ'X ©524) 

a 

since 

A *b = = 'LWaKb = A b*b- 

The inverse of A (when it exists) is now written as 

A_1 = (EQ26) 


since 


= £a. a <EQ27 > 

= Xa„a;‘^ = = L 

a a 

5.1 Eigenvector interpretation 

The eigenvalue problem is interesting unto itself for it specifies the set of patterns recalled 
without change by read operations (except for a constant multiplier A). It is the collective 
set of patterns that has this property and not just a single pattern 'F at x. If one were to 

write into memory a collection of patterns specified by the eigenvector x ¥ a then the dis- 
tributed internal representation would simply be A 'F . Reading from memory would re- 
trieve the data multiplied by the eigenvalue squared. Therefore, there exist data sets that 
are natural to an associative memory. The sets, however, are highly specific and are un- 
likely to occur in any actual experimental environment. They do, however, form a basis 
for arbitrary data sets and it is this property that is exploited in determining the write-rule 
that is the inverse of the read-rule of the memory. 


16 
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5.2 Radial kernels and shell operators 

An activation rule dependent only on the distance between its arguments will be called a 
radial kernel. Radial kernels can be written as 


A XjX (n) = J[d n (x,i)) (radial kernel). (EQ28) 


The function/is a scalar of one argument and d n is a distance metric for vector arguments 
of n dimensions. 


A radial kernel can be decomposed into a linear combination of more primitive shell oper- 
ator S(n, p) . A shell operator is activated with unit value when its components are separat- 
ed by a distance exactly equal to p . The shell operator is written as 


S x Jn, p) = 


0 


d n (x,x) = p, 
otherwise. 


(EQ 29) 


A radial kernel is then simply a weighted sum of these shell operators with p ranging 
over its permissible values as 


A(n) = £S(/i, p)/p). (EQ 30) 

P 

If one can solve the eigenvalue problem for 5 and the eigenvectors are not a function of p 
(as will be shown to be the case) then the eigenvalues for a general radial activation rule 
will simply be a linear combination of the eigenvalues of the S operator weighted by the 
radial dependence function/ 


A(n) = £X(/t, pMp). (EQ 31) 

p 

Since the eigenvectors of S are not a function of p (to be shown) it follows that the eigen- 
vectors of S are also the eigenvectors of A and the task thereby reduces to solving the ei- 
genvalue problem for S. The eigenvectors of 5 become die fundamental basis set for 
representing arbitrary memory configurations in all memories with radial activation rules. 

5.3 Eigenvalue problem for shell operators 

The task now at hand is to solve the following eigenvalue problem 

S(n, p)*P ( n , p) = X(n, p)^(n, p) (EQ 32) 

with 5 a shell operator defined by (EQ 29). The eigenvector*? has been explicidy tagged 
with n and p dependence. Now if S(n, p) commutes with S(n, p’) for some other p' then 
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it is possible to find a common set of eigenvectors for both operators (Merzbacher, 1961). 
If this is true for any two values of p it is true for all values of p and there is a single com- 
mon set of eigenvectors independent of p. 

5.4 Proof shell operators commute 

Let S’ be a shell operator with activation radius p’ . Then S and S’ are said to commute if 

SS' = S' S (EQ 33) 

or 

5X„<". P') = X*,>. P’)S.,An, p). (HQ 34) 

U V 

From the definition of S this is equivalent to requiring 

£/[d„(x,u)=p & d„(u,x')=p'] = £/[d„(x, v)=p' & d„(v,x')=p] (EQ 35) 

U V 

where / is the indicator function which equals one if its argument is true and is zero other- 
wise. The indicator function on the left-hand-side will be one for those points u which are 
p distant from x and p' distant from x . Does there always exist a v (for every u) that is p' 
distant from x and p distant from x'? If so then the above equation will be true. The one- 
to-one transformation 6 

M + v = x + x' (EQ 36) 

satisfies this requirement since 

d n (x, v) = !|x-v|| = j[x- (x + x’-u)|| = || ii - x' II = d n (u,x") (EQ37) 

and 

d n (v,x’) = Hv-x'll = || ( x + x-u ) -x’|| = ||x-u|| = d n (x,u). (EQ38) 

Therefore, it has been shown the S operators commute and the eigenvalue problem can be 
written as 

S(n, p)¥ ( n ) = Mn, p)^(«) (EQ 39) 

with a common set of eigenvectors v F(n) independent of p . 


6. It should be noted boundary conditions on the space can make it impossible to satisfy this symmetry con- 
straint It is assumed here this is not the case. 
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6.0 The Kanerva model 

Pentti Kanerva (1988) has presented a sparse distributed memory model of associative 
memories that uses patterns composed of long bit strings. He derives many interesting re- 
sults from this abstract model which is based on the standard random access memory of 
present day computers. The Kanerva model is specified by an activation rule (kernel) of 
the form 

1 d n (x, x)<,r, (EQ40) 

0 otherwise, 

for x,xe {0,1 )". 

The quantity r is a free parameter of the model and is called the activation radius. The met- 
ric, d n , is the Hamming distance between patterns or and x where both are binary valued 
vectors. The weights, x, are fixed and are called the address of a location (hidden node). 
This activation rule is a radial kernel where the radial dependence function/is a step func- 
tion. It is written 


A X ' X {n,r) = 


AM 


1 p^r, 

i 

0 otherwise. 


(EQ 41) 


This rule has a sharp discontinuity at distance r between x and x. The rule is not differen- 
tiable and so gradient descent methods which rely on first derivatives of the activation rule 
for error minimization can not be applied. This does not mean error signals can not be 
used to modify the contents of memory (see for example Prager, 1989) only that gradient 
techniques can not be used. One might be lead to believe this “hard thresholding” activa- 
tion rule would make it impossible to find an inverse write rule, however, this is not the 
case as I now show. The decomposition of radial kernels into sums of shell operators can 
now be specialized to the Kanerva model as 

n 

Mn, r) = £ Ap. r)S(n, p). (EQ42) 

p = 0 


6.1 Eigenvalue problem for the Kanerva model 

The task reduces to solving the eigenvalue problem for a shell that uses Hamming distance 
between binary vectors. 


^>P) =/ 


]T </(*,, i,)=p 


(EQ43) 
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where x jt x- are bits and d is one if they differ and is zero if they are the same. A recursive 
decomposition can now be applied to S(n, p) in order to represent it in terms of 
S( n ~ i> p) • With this decomposition it is possible derive a set of recurrence relations for 
the eigenvalues and eigenvectors for S. Now 


S x Jn,p) = / 


1 

d ( X n’ *„) + £^,i,)=p. 


i * 1 


(EQ 44) 


If the n components x n , x n are equal then evaluation of the indicator function falls upon 
the evaluation of the second term which is identical to S(n - 1, p). If the n^ components 
differ then evaluation of the indicator functions falls again on the second term but this time 
with reduced activation radius equal to p — 1 . These results can be summarized by parti- 
tioning the shell operator for n dimensions into four parts, one for each of the possible 
x n , x n combinations, namely 


S(n, p) 


S(n- l,p) S(n - 1, p - 1) 
S(n - 1, p - 1) S(rt - 1, p) 


(EQ 45) 


The dimensions of the operator S(n — 1, p) are 2 n '^ by 2 nd and grow by a factor of two as 
n increases by one. If one could establish the relationship between the solutions of the ei- 
genvalue problem for S(n, p) in terms of those for S(n— 1, p) and knowing the solutions 
for n=0, one would have the solutions for all n. This indeed is what will be done. I now 
state that an eigenvector for n dimensions is built up from a partition of eigenvectors of n- 
1 dimensions as 


(rii _ i 

±( ) " ^U'P(n-l)} 

Note normalization and orthogonality hold for n dimensions if they hold for n-1 dimen- 
sions, since 

^ T ± (n) = Y r (/j- 1 )¥(/*- 1) = 1, (EQ 47) 

^ r ± (#0^ (n) = 0. 

Completeness also follows if the set of eigenvectors for n-1 dimensions is complete. To 
see this note there are 2”' orthonormal eigenvectors for a binary space of n-1 dimensions. 
For each one of these, two are created for n dimensions; one which is constructed by the 
concatenation of equal vectors with a plus sign (and weighted by the inverse of ^2) and 
another which is constructed by a concatenation with a minus sign. Each of these new 2 n 
vectors is orthonormal (see above) and so the set spans the space. It remains to show 

as given by (EQ 46), is indeed an eigenvector of the shell operator S(n, p) as given 
by (EQ 45). The eigenvalue problem may now be written as 
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S(n - 1, p) 

S(n - 1 , p - 1) S(n - 1 , p) J [±^(n - 1) J 


= X ± (n, p) 


' Y ( n - 1) 
±V(n - 1) 


(EQ 48) 


One can quickly see that the operator applied to the partitions yields a multiple of the full 
vector since the vectors for n-1 dimensions are by definition eigenvectors of the operators 
for n-1 dimensions. The upper and lower partition yield the same relationship between the 
eigenvalue for n dimensions in terms of those for n-1 dimensions, namely 


X ± (n, p) = Mn- 1, p)±X(n- 1, p- 1). (EQ49) 


Now (EQ 46) and (EQ 49) are the desired recurrence relations for the eigenvectors and ei- 
genvalues of the Kanerva model. To derive a closed form expression for them begin by de- 
fining the eigenvector for n-0 (no bits) as equal to one, hence 


It then follows 


and 


¥( 0)3 1 . 


(EQ 50) 


(EQ 51) 


(EQ 52) 


One can see as n increases, vectors will be composed of strings of alternating signed 1 ’s. 
Let’s continue this process for one more step to help reveal an analytic expression of a 
vector’s components: 


*~(2> = * 


^(2) = x 


**(2> = s 


+1 

+1 

v+i; 

-l 

+i 

+i 

-i 


(EQ 53) 


(EQ 54) 


(EQ 55) 
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YJ2) 




1 -1 

2 -1 


y+u 


The analytic form that satisfies the vector expressions is 


(EQ 56) 



(Kanerva model eigenvectors). 


(EQ 57) 


The label a distinquishes different eigenvectors. It is a bit string residingin the same n -di- 
mensional space as that of x (the input bit pattern space). The quantity a T x is the inner 
product between the bit vectors a and x. It is the sum of the logical AND of the bits. The 
(-1) to this power retains only the information of whether this inner product has even or 
odd parity. 

To see (EQ 57) satisfies (EQ 46), let ct n and x n be single bits (the nth bit of a bit vector of 
length n) and a' , jc' bit vectors of length n-1 then: 


'W" ) - £ 


(- 1 ) 

(- 1 ) 


(a O+o'V) 
(a l+a' T x') 




aa'.xxW rr 


\_ 

■12 


— (- 1 ) 




i 


(-D 


Jr 1 


\ (_!) 


J2 




(EQ 58) 


(EQ 59) 


(EQ 60) 


As a n flips between zero and one the lower partition’s sign changes between +1 and -1 
thereby showing (EQ 57) does indeed satisfy the recurrence relation for the shell eigen- 
vectors. 

The eigenvectors that have been found to satisfy the shell operator for binary spaces are 
not new. In fact these functions were investigated by Hadamard and now bare his name 
(Harwit, 1979). For each a there is a Hadamard function 'F the collection of which form 
a basis for the binary space. These functions are the eigenvectors of all activation rules 
that depend only radially on the binary weights (address) of a hidden node and the binary 
input pattern. I now turn to the issue of an explicit form for the eigenvalues. Since each ei- 
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genvector is labeled by a bit string a it is also necessary to do so for each eigenvalue. 
Therefore, 


X a (n,p) = X 0 .(/i-l,p)+(-l) a 'X 0 .(n-l,p-l) (EQ 61) 

for a a bit and &' a bit-string of length n-1. The boundary conditions for A. are chosen to 
be zero for p outside of the interval [0,/i] of possible distance values. Hence 

X a (/i, p) = 0, p « [0, n] . (EQ 62) 

To solve (EQ 61) it is expedient to transform it into a generating function by multiplying 
by an arbitrary parameter, t, raised to the power of p and then summing over all p from 
zero to n. Let g a (n, t) be such a generating function expressed as 

P - Q 

It follows (using the above boundary conditions) that: 

g a {n, *) = 8a ( n “ 1. 0 + ~ l ’ f ), (EQ 

gain, t) = [ 1 + (-1) ' °-f] g a .(n - 1, t), (EQ 65) 

n 

g a (n, t) = niHHAl. W**) 

i — 1 

and so finally, 

g a (n,t) = (l-0 ,,O|, (l+0"" Ba|1 (EQ 67 > 

where || a|| is the Hamming weight of a (the number of bits equal to 1 in the bit-string). 
The beauty of the generating function representation is that it reveals the order- indepen- 
dence of the bits in the eigenvalue label a. The generating function can be used to deter- 
mine many interesting properties of the eigenvalues. An explicit expression for the 
eigenvalues can now be obtained by expanding the factors (EQ 67) in powers of t, collect- 
ing like powers, and equating the coefficients of these powers with the eigenvalues. A set 
of expansion coefficients, C, are defined here which will be useful later in the expression 
of the inverse Kanverva model. They are 

c p<-'«= <eq< *> 
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These coefficients have an alternating sign in front of the binomial coefficients and are 
closely related to the hypergeometric distribution (Feller, 1968). The shell eigenvalues can 
now be written as 


\ a (n, p) = C p (n - 1| cc||, || a||). (EQ 69) 

So, the eigenvalue problem for a shell operator with binary valued arguments and Ham- 
ming distance activation has been solved. The eigenvectors for the Kanerva model are the 
same as these eigenvectors. The eigenvalues are the sum of the shell eigenvalues up to the 
activation radius, r. 


r 


A a("’ r ) = X Kfo P)» 

p = 0 


A a (n, r) = X X (- 0 * 


p = 0* = 0 


(n-\\a. 

{ P-* 



(Kanerva model eigenvalues). 


(EQ 70) 


(EQ 71) 


6.2 Inverse of the Kanerva model 


Having solved the eigenvalue problem for the Kanerva model there still remains the issue 
of whether it has an inverse. This is equivalent to asking whether any of the eigenvalues 
are zero. Interestingly, the answer to this question is determined by whether the parity of a 
binomial coefficient is even or odd. If it can be shown that the parity of an eigenvalue is 
odd then that eigenvalue can not be the number zero. The recursion relation for the shell 
operator can be used to show the parity of shell eigenvalues is not a function of the eigen- 
value label a. Let k be a parity operator defined as 


Jt(a) = 


+1 

-1 


a even, 
a odd. 


(EQ 72) 


With this operator the parity of the sum of two numbers is the product of the parity of the 
numbers and the parity of the negative of a number is just the parity of the number, so: 

K(a + b) = TZ(a)K(b), (EQ73) 

Jt(-fl) = Jt(a). (EQ 74) 

The parity of the recurrence relation of the shell eigenvalues then becomes: 
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n(K( n > P)) = <K'( n - 1. p) + (- 1 ) “"M" - 1 . P - 1 ))* ^ 75) 

= n(k a .(n - 1 , p))7t((-l) 0 "X 0 ,(/i- 1, p - 1)), 


= K(X a ,(n - 1, p))7i(X a .(n - 1, p - 1)). 


It can be seen the parity for n dimensions is not a function of the nth bit of a . Since this is 
expressed recursively in terms of the parity for n-l dimensions it follows that the parity is 
not a function of any of the bits of a. As such, knowledge of the parity of A, a (/t, p) for one 
value of a yields the parity for all values of a. Choose a equal to the string of all 0’s with 
Hamming weight zero, || a|| = 0, then the eigenvalue for this a reduces to a binomial co- 
efficient expressed as 


\(n,p) = 



(EQ 76) 


In summary, the parity of a shell eigenvalue with label a is the same as the parity of the 
eigenvalue with label 0 which is the same as the parity of a binomial coefficient, so 


p)) = K(k Q (n, p)) = 7t(|p J). (EQ 77) 

The parity of the binomial coefficients for n up to 16 is presented in the following dia- 
gram. It is noticed for n=2 k -l (e.g. 0,1,3,7,15, etc.) the parity is constant for all p and 
equal to -1. It also appears that the diagram is self recursive; the top triangle down to 2 k -l 
is copied below itself to the left and to the right with the intermediate spaces filled with 
+’s. 
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Figure 6 Parity table for binomial coefficients. 


It is speculated the parity of a binomial coefficient is given by the rule 






n B A Pfl = °* 
otherwise 


(EQ 78) 


where p fl is the base-2 bit-vector representation of p , n B is the bit compliment of the 
base-2 representation of n, a is the logical AND of bits and 0 is a bit string of 0’s (all 
representations are considered padded with sufficient bits to represent the largest number). 
For example, n=12, p=4 implies n fl =1100, ii B =0011, p fl =0100, and n B a p B =0000. So 
the parity rule says 12 choose 4 should have odd parity. Looking back at the parity dia- 
gram it can be seen this is true. If this rule is true then when n is one less than a power of 
two the binary representation of n has all one-bits and since p is less than or equal to n the 
sets n fl and p fi are disjoint and their logical AND is zero for all bits. Now the Kanerva 
model makes use of the sum of the first r shell eigenvalues. The parity of this sum is the 
product of the parity of individual eigenvalues which is the product of the parity of the 
corresponding binomial coefficients. The parity of the sum of the binomial coefficients is 
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equal to this product and so can be used to evaluate the parity of the sum of the shell ei- 
genvalues. 


r 

*(A 0 (/i, r)) = K( £ \ a (n, p)), (EQ 79) 

p=0 

r 

= n p)>. 

P = 0 

= nxfcV 

p = 0 ^ ' 

- * i (;) 

p = 0 ^ ' 

which is the parity of the cumulative binomial distribution. The parity table for the cumu- 
lative binomial coefficients is given in the next diagram. 
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Figure 7 Parity table for cumulative binomial coefficients. 


It can be seen for n= 2 k (e.g., 1,2,4,8,16, etc.) the parity is odd for all coefficients accept for 
the last entry where r=n. As such, no eigenvalue for the Kanerva model can be zero for 
any activation radius less than n for n a power of two. It is now possible to state the fol- 
lowing theorem for the Kanerva model: 

If the dimensionality of the input space, n, is a power of two and 
the activation radius, r, is less than n then the Kanerva model 
has an inverse. 

There arc other combinations of n and r for which the Kanerva model has an inverse, how- 
ever, they are not as easily specified. 

Knowing the Kanerva model has an inverse we are in a position to derive an explicit ex- 
pression for it. Recall that an arbitrary invertible operator can have its inverse expressed as 
a linear combination of the outer products of its eigenvectors with the inverse of its eigen- 
values. If A -1 is the inverse of the Kanerva model then 
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A \n,r) = £A~ 1 (n,r)'F 0 'I'£. 
a 


(EQ 80) 


That the Kanerva activation rule is only a function of the distance between its elements 
would lead one to suppose its inverse would also have this property. This turns out to be 
true and can be demonstrated by breaking the sum over a into two sums, one over 
w = [| a || and the other over all labels with fixed Hamming weight || <x[| . 


r) 


-It 


i 


(-D ari (-i)° r * 


2C p (n-||a|i,||a||) 
p =o 


2 2^ * 


(-D 


a r (i + x) 


- r T— 

“ 2c p (»-|a|.||a||) 

p-0 

_ 2~ n ^ a| 11 a|l a w 

w = 0 j^C p (n-w,w) 
p = ° 


(EQ 81) 


The expression in the numerator of the last equation, interestingly, can be represented in 
terms of the expansion coefficients, C, and the hamming distance, d, between the vectors x 
and x. Now 


y ( _i)<* T (^*> _ y J[(- 1 ) a ' W+X|) » (EQ82) 

a| H o|| = w a| || a|| = w,- - i 

= 1 

«| II “II ="'i= 1 

If a component of x and x are equal then their sum is either zero or two. In either case this 
sum times the component of a is even. The quantity (-1) to an even power is 1 therefore 
components of x and x which are equal do not affect the overall product. Only compo- 
nents that differ have an affect. It follows one may replace the sum of x and x by their dif- 
ference in (EQ 82)and retain the same value for the product. 

Now x and x differ on d bits (define the number of bit differences as d). Since we are sum- 
ming over all possible vectors a with w one-bits it does not matter where in the x vectors 
the differences occur therefore we may conceptually move them to the beginning of the x 
vectors. 
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d 

|jc-jd = 11...1 00...0 . (EQ 83) 

D 

The sum over a can be decomposed into two parts where k bits of a fall in the 1 ’s region 
and w-k bits fall in the 0’s region. There are d choose k ways this can happen for the first 
case and n-d choose w-k ways this can happen in the second case. The first case contrib- 
utes (-1)^ and the second contributes unity to the overall product. One then sees 


s fi 

«i ii «ii = w i = i 



= CJn-d, d). 


(EQ 84) 


It can be shown that 


[w] C / n ~ w ’ w ) = (2 ) C v> (EQ 85) 

Combining these results a final expression for the inverse of the Kanverva model is ob- 
tained. 7 


w = 0 


C/n - w, w) 

/ \ r 1 

2) £ C p (« - w, w) 

' ; p = o 


(inverse of Kanerva model) (EQ 86) 


where 


d = d n (x,x). (EQ 87) 

The factors in this inverse expression can be given some interpreted when one considers 
the inverse’s behavior with the Kanerva operator along the diagonal which we know 
should be equal to 1. The way the terms collapse show the interdependence of the parts of 
the expression. 


7. The complexity of this expression caused me to seek verification of its correctness via nonanalytic means 
I programmed the function and for small values of n (up to eight) found that indeed the product of the Kan- 
erva kernel with its inverse gives the identity matrix. 
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(A-'A) x Jn,r) = £a;>, r)A,.>, r), 


-1 


(EQ 88) 


n 

= S Z A ^A n ' r)A x . Jn, r), 

It 

= I I. 

rf = 0 


-tfc 

<J = 0 v ' 


.-1 


A lA n ’ r ) A x\x( n > r )> 


I 3K- ,(/i ’ r) ’ 

d = 0 v / 


= 2 “ 


= 2 " 


„ , , ££>-»■,") 

' I ■ 

""° £ C p (n-w,w) 

p = o 


■ x (:} 

w = 0 V ' 


= 1. 


To develope a sense of what the inverse looks like, plots of the activation rule and its in- 
verse for n= 8 are presented below (the thin line is the activation rule for the Kanerva mod- 
el and is not in the same scale as the thicker inverse rule): 


January 23. 1991 


31 



Douglas G. Danforth 



32 


January 23, 1991 







Total Recall in Distributive Associative Memories 


For r-0 the inverse is just the identity transformation and this plot is not presented. For 
r=n, it was shown no inverse exists (all points in the space receive exactly the same infor- 
mation and there is no way to recover the original data). The plots show the form of the 
distance dependence of the inverse operator (write-rale) for r=l to r=n-l (n=8). It can be 
seen there is a general oscillatory behavior of the inverse. Some regions enfold data with a 
positive sign and some regions enfold it with a negative sign. 

Figure 9 depicts reading at a location (indicated by the uniformly-shaded gray disc) which 
is offset from the center of a write operation. The sum over the disc of the inverse-weight- 
ed data is zero .The magnitude and amplitude of the inverse rule is such that when informa- 
tion is recalled from memory, using the hard thresholding rule of the Kanerva model, 
competing patterns around the read point cancel exactly to leave only the data written at 
the point. 

The inverse of a activation rule has been obtained. The rule is quite simple and yet its in- 
verse is not. The oscillatory behavior is reminiscent of the Fourier transform of a step 
function which is a damped sinusoid. In the inverse case, however, the oscillations need 
not decrease with increasing distance. This concludes the discussion of the Kanerva mod- 
el. 



Figure 9 Reading off-set from a write point. 
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7.0 The Selected-Coordinate Design 

Louis Jaeckel (1989) has put forth an alternative design to the Sparse Distributed Memory 
of Pentti Kanerva. The design is in the same spirit as Kanerva’s model, however, there are 
some major differences. In the Selected-Coordinate Design (SCD) a hidden location will 
be activated only if there is an exact match between the input and its ternary valued 
weights. Ternary in the sense don’t care values are introduced. With don’t care values at- 
tention to the input is restricted to a subset of selected coordinates. Each hidden location 
has its own subset which are randomly chosen and have random bit values. If an input pat- 
tern matches the values of the selected coordinates exacdy then the hidden location is acti- 
vated. For notation convenience I now treat a “bit” as an element of the set {-l.+l } and a 
don’t care value as the quantity “0”. The activation rule for the Jaeckel model can be spec- 
ified as 


A 


X, 



£ 0 , / = 1 , 2 , 

otherwise. 


(EQ 89) 


for x,xe {-1,0,1)". 

If x and x differ on a bit then one of them must be a don’t care value (0). Notice I have ex- 
tended the range of input, x, to also include the value zero. This is for symmetry and al- 
lows a ready solution to the eigenvalue problem. A zero value for input can be considered 
a “don’t know” bit. If all input is known then a zero value will never occur. A don’t know 
matches any -1,0, or +1 value. 

7.1 Eigenvalue problem for the Selected-Coordinate Design 

The exact match criterion of the Selected-Coordinate Design allows a factoring of the acti- 
vation rule into a product of indicator functions. This factoring immediately reveals the ei- 
genvalue problem can be solved as a product of functions. Each function is a 3x1 vector 
solution of a 3x3 matrix, whose eigenvalues are readily calculated. I now go through the 
steps in obtaining the eigenvectors and eigenvalues for the Selected-Coordinate Design. 

Write the activation function as 


A x Jn) = /[*,.*. >0]. (EQ90) 

i = i 

Each factor in the product expression has the same form and is the activation rule for a sin- 
gle bit. This one -bit activation rule can be specified as a 3x3 matrix indicating whether ac- 
tivation will occur for each of the three possible hidden states of the first bit and each of 
the three possible states of the input for the first bit. Let A(l) be this 3x3 matrix then 
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Ml) = 


1 1 o 

lii- 

o l l. 


(EQ 91) 


That is, activation will not occur for xx = -1 . The eigenvalue problem for z4(l) is solved 
by the eigenvector y with eigenvalue X. The problem is specified as 


r4(l)y = Xy 


(EQ 92) 


and has nontrivial solutions only if the determinant 

1-X 1 0 

1 1 -X 1 

0 1 1 -x 

is zero. This leads to the cubic equation in X 


(EQ 93) 


( 1 — X) 3 — 2 ( 1 — X) = 0, 


(EQ 94) 


which has the three solutions 


X a = 1 +aj2 for a = -1, 0, 1. (EQ95) 

For each a there is a corresponding three component eigenvector y a given by 


¥ = 


1 


1 

2 + 2a 2 


1 

aj2 

2a -1 


(EQ 96) 


These three eigenvectors can also be grouped and displayed in the symmetric 3x3 matrix 
y where 


¥ 


1 _L 1 

2 Ji 2 

-1 

J2 
1-11 
2 J2 2 


T 0 


(EQ 97) 


The full eigenvector for the Selected-Coordinate Design is now given as the product of el- 
ements of y , namely 
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a.e (-1,0,1) 


(EQ 98) 
(EQ 99) 


where a is a “bit” string acting as a label for a specific eigenvector. There are 3” such ei- 
genvectors. The eigenvalues A a for the full n-bit space are the product of the one-bit ei- 
genvalues X., namely 



(EQ 100) 


These are further expressed as: 

A a = (EQ 101) 

= (1-V2)V°(1+V2)\ 

= (1 -J2) n (l + J2) n \ 


The n’s satisfy the equation 


n = n_ + n 0 + (EQ 102) 

where n _ , n 0 , and n + are the count of the number of -l's, 0’s, and +l’s, respectively, in 
the bit string a. 

7.2 Inverse of the Selected-Coordinate Design 

Recall the inverse of a matrix operator can be expressed as 

(EQ 103) 

a 

It can be seen from (EQ 100) that none of the eigenvalues for the Selected-Coordinate De- 
sign are zero and so the SCD has an inverse. Since the eigenvectors and the eigenvalues 
factor for each bit it follows that the inverse also factors into a product 
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- rK>> (eqim) 

i = i 

of one-bit inverses. This product is found to yield 8 


A~\l) 


0 1 -1 

1 - 11 - 
-1 1 0 


(EQ 105) 


The i4 -1 (l) matrix has constant values along 45 degree diagonals. This corresponds to the 
constraint thati, + x i is a constant. In fact, an element of the matrix is a function only of 
the absolute value of this sum. Note it is the sum and not the difference between the com- 
ponents that determines the behavior of the inverse. As such, the inverse of the Selected- 
Coordinate Design can not be expressed as a function of the distance between x and x 
which depends upon their difference. This fact is not intuitively obvious since the Select- 
ed-Coordinate Design’s activation rule can be expressed as a function of their difference. 
From the product form for the full inverse, with the insight the sum of components deter- 
mines the inverse’s value, it is possible to write a more explicit form for the inverse, name- 
ly 


*J>) = ( — 1 ) *° ( 1 ) *’ ( 0 ) ** 


(EQ 106) 


where the k t are the count of the number of terms equal to t. 

n 

k t= (EQ 107) 

i =* l 

n = ifc 0 + k x + k 2 . (EQ 108) 

Curves of constant inverse are no longer hyperspheres (cubes) as in the Kanerva model 
but are curves with constant k values. If any two corresponding bits (±1 ) of the input and 
the hidden location are the same then the store for that location is not modified (inverse is 
zero) otherwise the datum is either add or subtracted from the store. This completes the 
discussion of the Selected-Coordinate Design. 


8. It should be noted for the simple form of the Selected-Coordinate Design it is not necessary to go through 
the eigenvalue problem to obtain this inverse. 
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8.0 Radial activation rules 

I now turn to associative memories with real-valued input and output that have activation 
rules only a function of the distance between input and weights. Such systems have been 
considered by Poggio (1989) under the name of (Generalized) Radial Basis Functions. In 
this section I develop the inverse for such systems. The task at hand is to find a set of ei- 
genfunctions and eigenvalues which satisfy radial activation rules for real-valued input. 
One can go quite far in this task without specifying the exact form of the radial depen- 
dence. It will be shown spherical Bessel functions play a fundamental role in the radial de- 
pendence and Gegenbauer polynomials play a role in the angular dependence. 

8.1 Eigenvalue problem for radial activation rules 

The task is to solve the following operator problem with n-dimensional real-valued vari- 
ables and an activation rule that is a function only of the distance between its arguments: 

AY = AY, (EQ 109) 

J A{n, x, i) Y(i)<£c = A(/i)Y(jc) 

Volume 


where 


x,xeR 


(EQ 110) 


and 


A(n,x,x) (EQ 111) 

for some scalar function /. Now (EQ 109) is the multidimensional analogue of a homoge- 
neous linear integral equation of the second kind (Smithies, 1962) whose solution can be 
determined from the zeros of the Fredholm determinant. Rather than follow that path here, 
it is possible to decompose the operator in terms of shells and then apply Gauss’s theorem 
for integration over surfaces. One then obtains a differential equation that is separable in 
the radial and nonradial components. 

As with the Kanerva model, the radial activation rule can be decomposed into a superposi- 
tion of shell operators 


where 


A(n, x, x) = j/(p)S(rt, p, x, x)dp 
o 

S(n, p,x,i) = 5(p-||x-i||„). 


(EQ 1 12) 


(EQ 113) 
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The quantity 5 is the Dirac delta function (Friedman, 1956) and has the property that for 
every continuous function tp, 


J 5(x)<p(x)dx = (p(0). (EQ 114) 

— oo 

Operators S with different values of p commute and so a common set of eigenfunctions 
independent of p can be found. That is, 

J S(n, p, x, xY¥(x)dx = Mn, p)Y(x) (EQ 115) 

Volume 

with the eigenvalues of A given in terms of the eigenvalues of S by 


A(n) = jMn,p)f(p)dp. (EQ 116) 

o 

Note the generality of this expression. Nothing has been said about the exact form off It 
need not be monotonically decreasing. It only need be sufficiently smooth so the integral 
exists. This decoupling of the exact form of radial dependence of the activation rule from 
the shell eigenvalue problem is very powerful for it allows the determination of the eigen- 
functions for all radial activation rules. 

The task of solving the eigenvalue problem for A is now reduced to the task of solving the 
eigenvalue problem for S. 

8.2 Radial shell eigenvalue problem 

Now, the integral over a volume of a delta function that is a function only of the radial 
component becomes an integral over a surface area. Write the dummy variable of integra- 
tion as 


- x ( + r«. (EQ in) 

where r is the distance between x and x and u is a unit vector pointing from x to x. Note 
the differential volume element can be written as dx = r n ~ 1 (dr) (dS2) where dCl is a 
differential solid angle. One finds: 


J S(n, p, x. 

Volume 


WWdx = J 5(p - 1 | x - xll )"¥(x)dx. 


Volume 


= J J^p-rmx + riOr"- 1 ^ 

nLo 


dQ, 


= J ¥(* + p«)p" ” l dSl. 

a 


(EQ 118) 
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The shell eigenvalue problem becomes 

J Vix + pu)p n ~ 1 dil = X(n, p)^(x). (EQ 119) 

a 

This integral form can be transformed into a set of differential equations for 'F and X by 
first differentiating (EQ 1 19) by p . In this process the derivative of 'P under the integral 
sign must be taken. This leads to the gradient of 'F dotted with the unit surface normal u: 

fan = <EQ120) 

«»1 


= u- V¥. 


So 


^['Fp"- 1 ] = (n- l)p -1 ['Fp" -1 ] + (u V’FJp"' 1 . (EQ 121) 

Therefore, we find 

j Vda = (n - 1) p -1 J Vda* J (uV'F)da. (EQ 122) 

V Area Area Area 

Now one may exploit Gauss’s theorem (Richards, 1959) which holds for arbitrary dimen- 
sional spaces so that 


J (u-V'¥)da= J lv 2 ^jdx. (EQ 123) 

Area Volume V 

That is, the surface integral of a vector field dotted into the normal of the differential ele- 
ment of the surface is equal to the volume integral of the divergence of the vector field 
over the volume enclosed by the surface. In our case the vector field is the gradient of a 
scalar and so the divergence becomes the Laplacian, V 2 , of the scaler function. 

n 2 

V 2 = £ (the Laplacian). (EQ 124) 

j * l^ x j 

Now the Laplacian is taken relative to the argument of 'F, namely x. This is equivalent to 
the Laplacian relative to x evaluated at x since from (EQ 117) 

^V(x) = ^t(x + pu). (EQ 123) 
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Therefore, the Laplacian may be moved outside of the integral sign leaving a volume inte- 
gral of 'F as 


J 'P da = (/i — 1) p -1 J '¥da + V 2 j 'Vdx. (EQ 126) 

Area Area Volume 

The volume integral of ¥ is the radial integral of the surface integral of X P. The surface 
integral is by definition the eigenvalue problem and can be replaced by its right-hand side. 
The radial integral I will leave in integral form for a moment: 

| Vdx = 

Volume 


The original differentiation of the area integral of 'F with respect to p is equal to the de- 
rivative of the right-hand side of the eigenvalue equation, so 



3Ar x r \ 

= {n - 1) -*F + M Uri, r)drjV 2s ¥(x). (EQ 128) 

A second differentiation with respect to p removes the integral and yields the differential 
equation 


— y'P * (n-l)i- - W + A.V 2 *F(x). 

dp 2 5pIpJ 


(EQ 129) 


By dividing both sides of this equation by the product X'F and rearranging terms we ob- 
tain an equation whose left-hand side is a function only of p and whose right-hand side is 
a function only of x. The equation is 


(n-1) 3 fX'l 

V ^ ^pIp j 


^V 2v P(x) = constant. 


(EQ 130) 


The only way^this equation can hold for all p and x is for each side to be equal to a con- 
stant, say, -<o . The two differential equations, one for the eigenvalues and the other for 
the eigenfunctions, become 
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and 



(radial eigenvalues) 


(EQ 131) 


V 2 'P(x) + to 2 'P = 0 


(radial eigenfunctions). 


(EQ 132) 


8.3 Solution of differential equations for radial activation rules 

I now proceed with the process of finding the solution to the differential equations for the 
eigenvalues and eigenvectors of radial activation rules. 


8.3.1 Radial eigenvalues 

Differential equations of the following form (Gradshteyn, 1965, EQ 8.491. 6) 9 

1 -2a 


u + 


■u' + 




2-1 


u = 0 


(EQ 133) 


have solutions 


“ = z a Z v (fk) (EQ 134) 

where the Z are any of the Bessel functions of the first kind, J, second kind, N (Neumann 
functions), or third kind H (Hankel’s functions). When (EQ 131) is expanded we find it 
corresponds to a form of (EQ 133), namely 

X" + -J^X' + + = 0. (EQ 135) 

Using the appropriate associations one can then right down the following solution for X as 


9. There is a double misprint in one of the equations on page 971 of Gradshteyn. The correct form for equa- 
tion 8.491.3. is ^ 


1 -2a 

u" + «' + 


(Py* y ~ 1 


2 a 2 — v 2 ^ 2 
) + T~~~ 


u = 0, u = r a Z v (Pr Y ). 
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X(n,(o,p) = (cop ) n/Z Z n ^cop). 


(EQ 136) 


The order of the Bessel function Z is n/2-1 which is integral or half integral depending 
upon the dimensionality of the space. For n odd, the solutions are called spherical Bessel 
functions (Sneddon, 1961, pl20). 

The full radial eigenvalues are then the integral over p of the shell eigenvalues weighted 
by the radial activation function/. This is expressed as 


A(n, co) = J X(n, co, p)/(p)dp 
o 


(radial eigenvalues). 


An explicit form for A will depend upon the specification of /. 


(EQ 137) 


8.3.2 Radial eigenfunctions 

The differential equation for the eigenfunctions is a point-source form of the wave-diffu- 
sion equation (Richards, 1959) with no time dependence. Its solution in n-dimensional ra- 
dial coordinates is given in term of Hyperspherical harmonics (Avery, 1989). The fust two 
chapters of Avery’s book are a lovely summary of the generalization of spherical harmon- 
ics to n -dimens ions. I follow closely his derivations here. The generalized Laplacian oper- 
ator V 2 can be written in the form 


y2 _ y 3 2 _ 1 d f r n - 1 3 ^ ^ 

' faij - 7^Tr[ r Trj 7 


where LT is the generalized angular momentum operator, defined by 

t 1 — itfu 

*>j 

and 

1 = V ^ — r ^ 

x m; 


(EQ 138) 


(EQ 139) 


(EQ 140) 


2 

The eigenfunctions, Y ( m (Q), of the generalized angular momentum operator L satisfy 


) = 1(1 + n- 2) 7^(0) 


(EQ 141) 
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and are called hyperspherical harmonics. They are a pure angular function, independent 
of the hyperradius, r. They are also orthonormal since 

K„(a>l’ r .„(0)da = 8, r 8 mm .. (EQ 142) 

a 

The hyperspherical harmonics also satisfy a sum rule 

C/.U • o) = K,Zr'i, m (a)Y, JQ) (EQ 143) 


where 


K, = 2- 


n 


l-T 

2^2 J 


(EQ 144) 


and u, u are unit vectors in the directions specified by £2, The C’s are called Gegen- 
bauer polynomials and they satisfy the same differential equation as the hyperspherical 
harmonics. 


We now look for a solution of (EQ 132) by the method of separation of variables where 
the eigenfunction is written as a product of hyperspherical harmonics and a radial func- 
tion, R(r), to be determined. This is written as 


„(Q). 

When we do this the differential equation becomes 


(EQ 145) 


1 9 {* - 1 9 ' 

r "-i9rl 9r 


r2l 


r 2 j 


(*C oWF/.JQ)) +<0 2 R a (r)Y ltm (a) = 0. (EQ 146) 


Applying the operations to each function followed by division by RY yields 


f 1 9 " - 1 9 n /_\ 
r »-l9 r 9r ® r 

RJr) 


+ to 


J'l.JO) 


= constant. 


(EQ 147) 


The radial and angular dependencies have been separated. The equation must hold for all 
values of the independent variables r and & which can only be satisfied if the equation is 
equal to a constant. In this case, however, we know the constant must be equal to the ei- 
genvalue for the angular function Y l m which is l(l+n-2). Therefore, the differential equa- 
tion for the radial dependence becomes 
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R" 


CO, / r ) + 


(n-l) 


R' 


€ 0 , 



/(/ + «- 2 ) 



(EQ 148) 


where the index / has been included to show /?’s dependence upon the angular momentum 
parameter. With appropriate boundary conditions specified for R the parameter to will also 
be constrained. 


The differential equation for R is remarkably similar to that for X and yields similar Bessel 
function solutions, namely 

- f :-0 

R„Jr) w {tar) l 2 >Z n (tar) (EQ 149) 

2 + 

where now the order of the Bessel function is also a function of the angular momentum, /, 
and the power of the hyperradius, r, is now negative. 


8.3.3 Boundary conditions 

In electro-statics there usually are natural constraints on the class of functions at the 
boundary of the domain under consideration. Either the electric field goes to zero on the 
boundary or its derivative does. For associative memories and neural networks the issue of 
boundary conditions is not usually considered. Since I do not have a good sense of what 
conditions may be appropriate for associative memories I will arbitrarily consider func- 
tions with Dirichlet boundary conditions and let the reader modify the solutions from here 
on for his own situation if this assumption is not appropriate. 

Dirichlet boundary conditions (Jackson, 1962, pl5) specify the value of the function on a 
closed surface. This value I take to be zero. If the input vector, x, is constrained to lie with- 
in a sphere of radius a then by taking a large enough the restriction to zero values on a 
does not markedly restrict the values of the function at interior points. 

The condition the eigenfunction is zero on the boundary r-a places a restriction on the al- 
lowable values of co through the zeros of the Bessel function. 

R m fa) = 0 (Dirichlet boundary condition) (EQ 150) 

implies 

Z„ (to. .a) = 0 (EQ 151) 

2_i+/ 

2 


where 


to 


M 


■kj 


a 


(EQ 152) 
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with Zj. [ a zero of the Bessel function. For each value of / (and n) there are a countably in- 
finite number of zeros ( k values) of Z. I now label the radial eigenfunctions with k and in- 
troduce a normalization factor N. One can now write 



where 


jlftjry-'dr = 1 


and 


-l 


I 2 * "_ 1 + / (co M r > rJr 
0 2 


- 1/2 


N t.i = a b 

The R functions also satisfy the following orthogonality condition 

a 

\ R k,{f) R k\{r)r n ~ X dr = 5 m . 
o 

since the Bessel functions satisfy ^ 

a 

l Z 2 , . Z®*, t r ) Zf i _ . /®*', = *• ■ 

0 2 2 

The eigenfunctions are now written as 




(EQ 153) 


(EQ 154) 


(EQ 155) 


(EQ 156) 


(EQ 157) 


(EQ 158) 


Using the quantized values for the frequencies the eigenvalues for the shell operator are 
expressed as 


10. See Gradshteyn, p634, eq 5.54.1. for an expression for arbitrary Bessel functions. (EQ 157) holds for 
Z=J, Bessel functions of the first kind. 
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(EQ 159) 


which when combined with the radial activation function yields the quantized eigenvalues 
for the full problem. 


\ /«) = p)XpMp 

o 


(discrete radial eigenvalues). 


(EQ 160) 


The complete solution of the eigenvalue problem for radial activation rules has been ob- 
tained. I now turn to the task of finding the inverse activation rule for radial activation 
functions. 
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8.4 Inverse of radial activation rules 


The inverse of the radial activation rule can now be expressed in terms of the spectral rep- 
resentation of the operator as previously stated in (EQ 26). Using the eigenvalues and ei- 
genvectors found in the last section we see 


becomes 


A l (x>*) 


.4- A. . 

k,l,m k, l 


(EQ 161) 


A V,*) 


Y [**,/r)r />m (a)] [R^ m (Q)] * 

L A 

k, l, m k,l 


(EQ 162) 


Rearranging order of summation and collecting terms which are only a function of the in- 
dex m gives 


A tax) 





U 


'lYUW'an)} 

K m J 


(EQ 163) 


The hyperspherical harmonics’ sum rule can now be applied to yield an expression in 
terms of Gegenbauer polynomials, C. The expression is 







The index k is applicable only to the first factor so regrouping gives 


A l (i, x) 


1 

l 


1 

k 


A K t 


)■ 


(EQ 164) 


(EQ 165) 


One is now in a position to substitute the explicit Bessel function form for the radial de- 
pendence of the eigenvectors. This yields the expression 


A~'(X, x) = (rr) 



« — 1 



(EQ 166) 


One knows from symmetry the inverse, like the rule itself, must be a function only of the 
distance between its arguments. This is partly revealed by the argument of the Gegenbauer 
polynomial which is a function only of the inner product between the directions of x and 
x. An orthogonal transformation of the space (rotation) applied to both x and x leaves their 
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lengths (r, r )and the inner product ( u, u ) unchanged. The inverse is therefore invariant un 
der such a transformation. 

This completes the discussion of the inverse for radial activation rules. For each explicit 
form of radial dependency, fir), there will be an inverse activation rule as given by (EQ 
166). Further work is necessary to investigate this expression for, Gaussian, exponential, 
and spherical activation rules. 


so 
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9.0 Summary and conclusions 

In previous sections were presented the derivations of the inverse activation rules for three 
associative memory models. I summarize the results here. 


9.1 The Kanerva model 

The Sparse Distributed Memory model (Kanerva, 1988) uses a hard threshold activation 
rule with binary input and binaiy weights. Any pattern closer than r in Hamming distance 
to the weights of a hidden node will activate the node with unit activity; otherwise the 
node’s activity will be zero. The Sparse Distributed Memory activation rule is written 


A x,>’ r ) = 1 


1 d n (x, x) <, r, 
0 otherwise. 


(activation rule) 


(EQ 167) 


The inverse activation rule is not simple. 

C/n - w, w) 


*7* <n, r) = 2- y » 

x,x y y ' JLd w 


w = 0 


d Xc p (n-w,w) 

■ y p = 0 


(inverse activation rule) (eq jgg) 


with 


C p (a,b) = £ (EQ 169) 


d = d n (x,x). 

As can be seen from Figure 8 there is a general oscillatory behavior to the inverse (it bears 
some resemblance to the Fourier transform of a step function). This oscillatory behavior 
creates (in the multidimensional space) layers with alternating sign. The strength and sign 
of a layer determines the factor a datum will be multiplied by before being added into the 
memory in that region of space. 

That is, about each hidden node is an “onion” with layers of alternating sign that deter- 
mines how that node treats data to be written to its own local store (output weights). Un- 
like the activation rule that is localized about each hidden node, the effects of the inverse 
rule spread throughout the space. This is analogous to the behavior of a function and its 
Fourier transform where concentration of the function in a localized region produces a 
spreading in the transform space. Hence, learning with the inverse rule adjusts all of mem- 
ory. Another analogy of the inverse operator can be drawn to the “Mexican hat” type of 
on-center-off-surround neural response in the eye. Such a response can be understood as a 
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general neural property necessary to form sharp boundaries in vision. It localizes effects in 
a distributive system in the same way the inverse localizes memory. 

9.2 The Selected-Coordinate Design 

The Selected-Coordinate Design (Jaeckel, 1989) is similar to the Kanerva model except 
that only a subset of an input pattern is attended to by each hidden node. This is accom- 
plished by including “don’t care” bits with the binary values that an address (input 
weights) of a hidden node can take. Such “sparse” sampling of input is also found in the 
sampling of mossy fibers by granule cells in the cerebellum (Marr, 1969). Another differ- 
ence is that among the “care” bits an exact match must occur for the hidden node to be ac- 
tivated. The activation rule for the Selected-Coordinate Design is 


A 


XyX 


in) 


fl * 1, 2, 

[0 otherwise, 


(activation rule) (EQ 170) 


x,xe {-1,0,1}". 

The exact match condition makes it easy to derive the inverse of the Selected-Coordinate 
Design where it is found 

A^ x = ( 1) *' (0) * 2 (-1) * 3 (inverse activation rule). (EQ 171) 

The k s aic the count of the number of components of \x + i| which equal s. This write rule 
has the property that only points, x, which are dissimilar to a hidden location, x, are writ- 
ten to i’s store. Data will be added to the store if the number of “care” bits in x is even, 
otherwise they will be subtracted. 

9.3 Radial activation rules 

Radially dependent rules for function interpolation have been considered in the past and a 
generalized form in the context of neural networks has been considered by Poggio (1989). 
Radial activation rules can be stated as 

A(x, x) = Mx- ill ) (activation rule) (EQ 172) 

where /is an arbitrary real valued function of a single variable. It is shown the eigenvec- 
tors are radially Bessel functions, angularly Gegenbauer polynomials, and the eigenvalues 
are a convolution of Bessel functions with the radial activation rule. The inverse activation 
rule for radial functions is 
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A Vi, x) = (rrj l 2 ') 


I 

I 


z n (<o kl r)Z„ ( 0 > k f) 

1-1+1 ' l r-l + / *•' 

1^—7 

0 2 


( 


C/ii • u) 


(EQ 173) 


where 


(inverse activation rule) 


oo 


(EQ 174) 


*k,l 

*■' = fl ’ 


z » (** /) = 0. 
J-1+/ 


9.4 Conclusions 


The use of iterative error correction in artificial neural networks and associative memories 
appears to be a series expansion of some operator. When this intuition is rigorously formu- 
lated for asymptotically large single-layer neural networks it is found error correction, in- 
deed, is a series expansion of the inverse activation rule of the network. 

Closed form expressions for the inverse show information must be spread throughout 
memory in an oscillatory manner to cancel the effects of competing patterns. 

The presence of an inverse also shows write-rules exist for very rapid single-step learning. 

The methodology of iterative error correction does not provide an understanding of the 
form of the weights resulting from its application whereas inverse activation rules speci- 
fies precisely the weightings that must be applied to the data to give total recall. It is the 
knowledge that such exact specifications exist which is the major result of this paper. 

The existence of an inverse in the asymptotic case, causes one to seek the same type of 
analysis for practical distributive memories. The lattice of input observation points in such 
systems can act as a starting point of this analysis. The similarity of lattice points (as deter- 
mined by the activation function between the points) forms a matrix that can be inverted 
under certain circumstances. The analysis of this case and others will be the subject of fu- 
ture work (Danforth, 1991). 

It is expected drastic methodological changes will not be necessary to increase the q uali ty 
of recall and generalization in practical systems. It is simply a question of what changes 
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should to be applied. Theoretical analysis is the guide that can show us which passage will 
lead us to the appropriate changes. 

hi the domain of neural networks (still new and in search of solid mathematical princi- 
ples), I found this research to be refreshingly concrete. 
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